A Novel Semantically-Time-Referrer based Approach of Web Usage Mining for Improved Sessionization in Pre-Processing of Web Log
نویسندگان
چکیده
Web usage mining(WUM) , also known as Web Log Mining is the application of Data Mining techniques, which are applied on large volume of data to extract useful and interesting user behaviour patterns from web logs, in order to improve web based applications. This paper aims to improve the data discovery by mining the usage data from log files. In this paper the work is done in three phases. First and second phase0 which are data cleaning and user identification respectively are completed using traditional methods. The third phase, session identification is done using three different methods. The main focus of this paper is on sessionization of log file which is a critical step for extracting usage patterns. The proposed referrertime and Semantically-time-referrer methods overcome the limitations of traditional methods. The main advantage of preprocessing model presented in this paper over other methods is that it can process text or excel log file of any format. The experiments are performed on three different log files which indicate that the proposed semantically-time-referrer based heuristic approach achieves better results than the traditional time and Referrer-time based methods. The proposed methods are not complex to use. Web log file is collected from different servers and contains the public information of visitors. In addition, this paper also discusses different types of web log formats. Keywords—Web Usage Mining; User Identification; Session Identification; Semantics; Data Cleaning; Time Heuristics; Referrer Heuristics
منابع مشابه
Pre Processing of Web Logs – An Improved Approach For E-Commerce Websites
In this paper an improved approach for pre processing of web logs data has been proposed and evaluated so that it can be applied for web logs of e-commerce web sites. The resultant web log data after these pre processing steps can be used for further pattern discovery and analysis that helps to provide useful prediction to enhance e-commerce. Ideally, the input for the Web Usage Mining process ...
متن کاملPath-Source Oriented Session Identification Based on Linked Referrers and Log Indexing
Web usage mining has been widely adopted in various fields such as optimizing site structure, user-behavior analysis, personalized web services and system performance tuning. Although much research has been done against web log mining algorithms and log preprocessing techniques, the study of efficient retrieval of the structured contents for web log mining is seldom reported. In this paper, we ...
متن کاملSessionization –A Vital Stage in Data Preprocessing of Web Usage Mining-A Survey
The World Wide Web has impacted on almost ever aspects of our lives in modern era. The Web has many unique characteristics and which make mining useful information and knowledge a challenging task. Web mining uses many data mining techniques but it is not an application of traditional data mining due to heterogeneity and unstructured nature of the data on Web. Web mining tasks can be categorize...
متن کاملF Ramework for W Eb L Og D Ata Using a L Earning a Lgorithm
With the continued growth and proliferation of Web services and Web based information systems, the volumes of user data have reached astronomical proportions. Before analyzing such data using web mining techniques, the web log has to be pre processed, integrated and transformed. As the World Wide Web is continuously and rapidly growing, it is necessary for the web miners to utilize intelligent ...
متن کاملAnalyzing the User Navigation Pattern from Weblogs Using Data Pre-processing Technique
In the real world, lot of users attracted towards online shopping, so lots of transactions are going on in the websites. A weblog contains series of entries updating frequently by the user while accessing the website. Based on the user interest, it can be classified as related and unrelated data. The related data can be considered as success response, but the unrelated data can be considered as...
متن کامل